Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC)

نویسندگان

Masaru Kitsuregawa

Yasushi Ogawa

چکیده

The Super Database Computer (SDC) is a highperformance relational database server for a joinintensive environment under development at University of Tokyo. SDC is designed to execute a join in a highly parallel way. Compared to other join algorithms, a hash-based algorithm is quite efficient and easily parallelieed, and has been employed by many database machines. However, in the presence of data skew, it’s hard to distribute load equally among processing modules (PMs) by statically allocating buckets to PMs, as in the conventional parallelieing strategy. Thus, performance is severly degraded. In this paper, we propose a new parallel hash join method, the bucket spreading strategy, which is robust for data skew. During partitioning relations, each bucket is again divided into fragments of the same sise and these fragments are temporarily placed on PMs one by one. Then each bucket is dynamically allocated to a PM which actually carries out the join of the bucket, and all fragments of the bucket are collected in the corresponding PM. In this way, the bucket spreading strategy evenly distributes the load among the PMs and parallelism is always fully exploited. The architecture of SDC is designed to support the bucket spreading strategy; a mechanism which distributes the buckets flatly among the PMs is embedded in the hardware of the interconnection network. Simulation results confirm that the bucket spreading strategy is robust for data skew and attains very good scalability. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage. the VLDB copyright notice and the title of the publication and its date appear. and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise. or to rcpuhlish. requires a kc and/or special permission from the Endowment. Proceedings of the 16th VLDB Conference Brisbane, Australia 1990 Yasushi Ogawa Research and Development Center, RICOH Co., Ltd. 16-l Shinei-cho, Kohoku-ku,

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Implementation and Performance Evaluation of the Parallel Relational Database Server SDC-II

This paper presents the implementation and performance evaluation of the SDC-II, the Super Database Computer II. The SDC-II is a highly parallel relational database server, which consists of eight data processing modules interconnected by two networks, where each module contains up to seven processors connected by two busses and four disk drives. The SDCII employs several key techniques to effi...

متن کامل

Implementation and Evaluation of the Bucket Flattening Omega Network of the Parallel Relational Database Server

This paper presents the implementation and performance evaluation of the Bucket Flattening Omega Network of the SDC-II, the Super Database Computer II. The SDC-II is a highly parallel relational database server, which consists of eight data processing modules interconnected by two networks. Parallelism in the parallel relational database processing on the shared nothing architecture would suffe...

متن کامل

Implementation and Evaluation of the Bucket Flattening Omega Network of the Parallel Relational Database Server SDC-II

متن کامل

Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning

Shared nothing multiprocessor archit.ecture is known t.o be more scalable to support very large databases. Compared to other join strategies, a hash-ba9ed join algorithm is particularly efficient and easily parallelized for this computation model. However, this hardware structure is very sensitive to the data skew problem. Unless the parallel hash join algorithm includes some load balancing mec...

متن کامل

An Improved Hash-based Join Algorithm in the Presence of Double Skew on a Hypercube Computer

This paper presents an improved parallel hash-based join algorithm on a hypercube computer in the presence of double skew. We describe a load balancing technique to evenly distribute both join relations across all processors in order to deal with double skew eeectively. Moreover, we propose a permutation join method which reduces main memory requirement for the local join operation in the previ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1990

Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC)

نویسندگان

چکیده

منابع مشابه

Implementation and Performance Evaluation of the Parallel Relational Database Server SDC-II

Implementation and Evaluation of the Bucket Flattening Omega Network of the Parallel Relational Database Server

Implementation and Evaluation of the Bucket Flattening Omega Network of the Parallel Relational Database Server SDC-II

Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning

An Improved Hash-based Join Algorithm in the Presence of Double Skew on a Hypercube Computer

عنوان ژورنال:

اشتراک گذاری